Overview
Music is an important component in the entertainment. Some people say that music may tell us what the artist wants to say, but the popular music tells us what the people want to hear. This project aims to use the sentimental analysis on lyrics of popular music and Twitter posts related to the songs to reveal the secret of the successful songs/lyrics and the impact of the emotions in songs to the audience.
Subquestions:
1. What is the most frequent word used in the Populor Songs?
2. What Have Changed to Popular Songs?
3. Who is the most popular singers in last ten years in Billboard?
4. Is there a pattern for those successful songs?
Data Collection and Data Cleaning
Music Data
This project collected the information for music and Twitter posts from different APIs.
The billboard.py, a Python API for accessing music charts from Billboard.com, is used to collect the tiltles and artists’ names. With the music information, the PyLyrics, a python module to get lyrics of songs from lyrics.wikia.com, helped to find those lyrics.
The biggest problem for the data cleaning in this part is those special signs in the titles or the singer lists:
Graph 1(Tableau)
We Can see that there are very few songs using languages other than English. The songs with different languages are deleted so that the sentimental analysis wil be more accurate.
The comparision between data before and after data cleaning

Graph 4(Tableau)
1. What are the most frequent words used in Populor Songs?
Let’s start to find our answer by showing some EDA plot. In the first sub-question, our team want to simply study what is the word used most frequently in the lyrics. We first use the word cloud plot for all the songs in Billboard Top 100 to study it.
Graph 8(matplotlib, Word Cloud), Source Code
The plot shows that there are still a lot of words related to love and romantics. Besides, there are less dirty words in the higher ranking songs, the reason might be that these songs are accepted by more people and people are less acceptable with lyrics full of offensive words.
2. What Have Changed to Popular Songs?
Now, our team want to study the trend for the development of popular music throughout the years. The plots will show how the names of songs changed, what kind of songs will stay in the Billboard longer
Source: BillBoard Top 100 Weekly List, 1990 - 2010
Data Grabbed: Song Name, Artist, Number of Weeks Stayed on Billboard, Year, Peak Position, and Lyrics
1. How did the Song Name Change from 1990 to 2017?
- Calculated the Average Number of Words in Song Names and Visualized by Year
2. Is There Any Difference Between the Top 50 Songs and the Bottom 50 Songs?
- This time we calculated average num of words in song names again, but split by Top 50 and Bottom 50
- The Bottom 50, the relatively less popular songs, have more variations
3. What Songs Are More Likely to Stay Popular for Longer?
- Measured by Number of Weeks Stayed on Billboard
4. What Lyrics Made Songs Popular from 2008 - 2018?
- Ranked artists by number of songs from them that were ever on BillBoard
3. Who is the most popular singers in last ten years in Billboard?
Does those singers with more songs count have higher rank in the board?
Graph 16(Tableau)
According to the plot above, the answer is no. Not let us consider what is the relationship between the number of singers’ songs with the number of weeks on the Billboard Top100.

Graph 17(Tableau)
It seems there is a logarithm relationship between the count and the mean total weeks of songs on the Billboard.
4.What is the characters for those popular songs’ lyrics?
Talyor Swift Example

Graph 18(ggplot), Source Code
The average word count for the tracks stands close to 375, and chart shows that maximum number of songs fall in between 345 to 400 words. The density plot shows that the distribution is close to a normal distribution.

Graph 19(ggplot), Source Code
Basically, the most frequent mood in the songs is positve. And we can see that Talyor Swift have expressed all kinds emotions in her songs. Joy, anticipation and trust emerge as the top 3.

Graph 22(ggplot), Source Code
We can see that joy has maximum share for the years 2010 and 2014. Overall, surprise, disgust and anger are the emotions with least score; however, in comparison to other years 2017 has maximum contribution for disgust. Coming to anticipation, 2010 and 2012 have higher contribution in comparison to other years.
Some other visualizations graphed by seaborn to see the popularity of the songs and how it changed during the years or even months.
Graph 24(Seaborn), Source Code
The graph above shows the density of relatively time x-aixs and the lengths of weeks that songs stayed on board. It’s eaily to tell that the graph has two part of high density, which means in the middle time of these years. Around year 2014, there are lot great songs been created. Also, around year 2012, a lot songs been created but did not stay on the board as long as the songs been created in 2014.
Graph 25(Seaborn), Source Code
The graph above shows the number of favorites and retweets to the songs in ten years. As we can directly see, there is a huge difference between before year 2017 and after year 2017. After year 2017, the number of favorties and retweets increased incredibly. That tells either the songs on the board have more influence after year 2017 or people started to be carzy of using tweets.
Graph 26(Seaborn), Source Code
The graph above shows the distribution of songs’ popularity in each year. As we can see, 2017 has highest mean, which means most songs been created in 2017 are more popular than other years. However, for 2008, although the high point is really high in this year, but the mean is low. which means it has both good songs and bad songs in this year.
Graph 27(Seaborn), Source Code
The graph above shows the popularity of the songs in each month these years. From the graph, we also expected to see some trends through the changing of the color in the graph. And as the result we can tell from the graph, it did has a slight pattern of the increasing popularity from 2008 to around 2013. However, after that, popularity of the songs in each month became unpredictable.